Learning to Operate an Excavator via Policy Optimization
نویسندگان
چکیده
منابع مشابه
Learning to Operate Anaerobic Bioreactors
This chapter describes the main parameters and the different techniques used to characterize an anaerobic bioreactor and how to apply the obtained information to learn how to operate it. In this kind of biological systems, the large number of processes, operating parameters and removed compounds involved makes necessary to combine knowledge from both Microbiology and Chemical Reactor Engineerin...
متن کاملEnhanced Delta-tolling: Traffic Optimization via Policy Gradient Reinforcement Learning
The prospect of widespread deployment of autonomous vehicles invites the reimagining of the multiagent systems protocols that govern traffic flow in our cities. One such possibility is the introduction of micro-tolling for fine-grained traffic flow optimization. In the micro-tolling paradigm, different toll values are assigned to different links within a congestable traffic network. Self-intere...
متن کاملGuided Cost Learning: Deep Inverse Optimal Control via Policy Optimization
Reinforcement learning can acquire tcomplex behaviors from high-level specifications. However, defining a cost function that can be optimized effectively and encodes the correct task is challenging in practice. We explore how inverse optimal control (IOC) can be used to learn behaviors from demonstrations, with applications to torque control of high-dimensional robotic systems. Our method addre...
متن کاملLearning to Cooperate via Policy Search
Cooperative games are those in which both agents share the same payoff structure. Valuebased reinforcement-learning algorithms, such as variants of Q-learning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Policy search methods are a reasonable alternative to value-based methods for partially observable environm...
متن کاملLearning to Abstain via Curve Optimization
In practical applications of machine learning, it is often desirable to identify and abstain on examples where the a model’s predictions are likely to be incorrect. We consider the problem of selecting a budget-constrained subset of test examples to abstain on, with the goal of maximizing performance on the remaining examples. We develop a novel approach to this problem by analytically optimizi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Procedia Computer Science
سال: 2018
ISSN: 1877-0509
DOI: 10.1016/j.procs.2018.10.301